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Abstract 

The probability of accepting a candidate move in the hybrid Monte Carlo algorithm 
can be increased by considering a transition to be between windows of several states 
at the beginning and end of the trajectory, with a state within the selected win- 
dow being chosen according to the Boltzmann probabilities. The detailed balance 
condition used to justify the algorithm still holds with this procedure, provided the 
start state is randomly positioned within its window. The new procedure is shown 
empirically to significantly improve performance for a test system of uncoupled os- 
cillators. 



(Figures 2, 3, and 4 are not present in this version.) 
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1 Introduction 



The hybrid Monte Carlo algorithm of Duane, Kennedy, Pendleton, and Roweth Q 
is a method of sampling from complex distributions, such as those encountered in 
statistical physics, that combines the advantages of dynamical methods 0] with 
those of the Metropolis Monte Carlo method 0]. These and related methods are 
reviewed by Toussaint ||. 

The problem is to simulate a system parameterized by a vector, q, of dimension 
N, with a differentiable potential energy function, -E'(q). This energy function 
induces a Boltzmann distribution over q, for which the probability density is 

p(q) = Z^exp^q)) (1) 

where Ze = J r n exp(— E(q)) dq. (A temperature of one is assumed throughout, for 
simplicity.) 

The aim of the simulation is to estimate the expectation of some function, h(q): 

(h) = / h(q)p(q)dq « -$>(q0 (2) 
Jrn nf^ 

where qo,qi, • • • ,qn-i are obtained from the simulation, and are each distributed 
according to the Boltzmann distribution for q (but are not, in general, independent). 

I will first describe the dynamical approach to solving this problem, which suffers 
from systematic error in the sampling, and then describe the hybrid Monte Carlo 
method, which eliminates this error by accepting only some of the dynamical moves. 
Next, I present and justify a generalization of this algorithm in which moves are 
made between windows of states at the beginning and end of a trajectory, rather 
than between single states. Situations in which this algorithm will have a higher 
acceptance probability than the standard algorithm are then illustrated, and an 
empirical comparison is given for systems of uncoupled oscillators. Finally, I discuss 
two variations on the algorithm that may be useful in some circumstances. 

2 The dynamical method 

In the dynamical simulation method, we introduce a momentum vector, p, which, 
like q, has dimension N, and a Hamiltonian function, H(q, p), that incorporates 
both potential and kinetic energy: 

H(q,p) = E{q) + \ |p| 2 (3) 

We then seek to sample from the Boltzmann distribution that H induces on the 
phase space, (q, p), for which the density is 

p(q,p) = Z^ 1 exp(-fl'(q,p)) = p(q)p(p) (4) 
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where Zh = J RN J RN exp(— il(q, p)) dqdp, and 

p(p) = (2vr)- JV / 2 e X p(-i|p| 2 ) (5) 

This sample is generated by simulating an ergodic Markov chain that has the 
Boltzmann distribution for (q, p) as its stationary distribution. Values of q from 
successive states of this Markov chain, whose marginal distribution is that of equa- 
tion ([!]), are used to estimate (h) using equation (||). 

This Markov chain operates by alternating dynamical transitions with stochastic 
transitions. 

The dynamical transitions consist of simulating the system for some predefined 
period in a fictitious time, r, using Hamilton's equations: 

These equations leave H invariant. Furthermore, the volume of a region of phase 
space remains constant as it evolves according to this dynamics (Liouville's the- 
orem). In consequence, the dynamical transitions sample regions of constant H 
without bias. 

The stochastic transitions allow regions with different values of H to be explored. 
They consist of replacing p with a value picked from its Boltzmann distribution 
(equation @). Such transitions clearly leave the Boltzmann distribution of (q, p) 
with respect to H invariant. The presence of these stochastic transitions will also 
usually be enough to ensure that the Markov chain is ergodic — i.e. that the system 
can move to any point in phase space. 

It is generally desirable that the trajectories simulated in the dynamical tran- 
sitions be long enough that they reach configurations almost independent of their 
starting configurations. This avoids the slow exploration, at the rate of a random 
walk, that would result if the direction of motion were frequently randomized by 
stochastic transitions. 

In practice, the dynamics must be simulated with some finite step size. The 
"leapfrog" method is generally used, with the following steps being iterated some 
predefined number of times, L, with some specified step size, e: 

p(r+|) = p(r) - §V£(q(r)) (8) 
q(r + e) = q(r) + ep(r+§) (9) 
p(r + e) = p(r + f) - |V%(r + £ )) (10) 

This discretized mapping still preserves phase space volume exactly. However, with 
a finite e, it does not leave H exactly constant. This will introduce some systematic 
error into the sampling. 
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3 The hybrid Monte Carlo algorithm 



The systematic error of the dynamical method is eliminated in the hybrid Monte 
Carlo algorithm [0] by considering the end-point of the trajectory found with the 
leapfrog method to be merely a candidate for the next state of the Markov chain, 
to be accepted or rejected as in the Metropolis Monte Carlo algorithm 

Acceptance or rejection of the candidate state is based on the amount, AH, by 
which H for the candidate state exceeds H for the current state. The probability 
of acceptance, a(AH), is given by 

a(AH) = min(l,exp(-A#)) (11) 

Thus candidate states with lower H are always accepted, while those with higher 
H are accepted with probability exp(— AH). If the candidate state is rejected, the 
new state is the same as the current state (and is counted again in the average of 
equation (||)). 

The validity of this procedure for producing a sample from the Boltzmann distri- 
bution is more easily seen if we imagine that the trajectory for a dynamical transition 
is computed using a value for e whose sign is chosen at random, with positive and 
negative values being equally likely. The leapfrog method (equations (||) to ([h])) is 
time-reversible, so that a "forward" trajectory, with a positive e, and a "backward" 
trajectory, with the corresponding negative e, are inverses of each other, a fact cru- 
cial to the justification of the algorithm. In fact, usual practice is to always use a 
positive e, since an effect equivalent to randomly choosing its sign is produced in 
any case by the randomization of the direction of p in the stochastic transitions. 

To show that the Boltzmann distribution (equation (|j)) is invariant under such 
dynamical transitions, it suffices to show that these transitions satisfy the condition 
known as "detailed balance" — that the probability of a transition from A to B 
occurring is the same as that for a transition from B to A, given that the start state 
is Boltzmann distributed. 

To see that detailed balance holds, consider a small region of volume 5V around 
the point A = {<\a-,Pa)- Suppose that a forward trajectory from A leads to the 
point B = (qs, Pb)- The other points in the region around A will lead to a region 
around B, which will also have volume 5V, since the leapfrog method preserves 
phase space volume. Due to time reversibility, a backward trajectory from B will 
lead to A. The detailed balance condition with respect to the regions around A and 
B can now be written as 

p(<U,Pa)8V ■ \ ■ a(H{a LB ,p B )-H(o LA ,p A )) 

= p(<Ib,Pb)5V ■ \ ■ o,(H(cia,Pa)-H(cib,Pb)) (12) 

The left side of the above equation is the probability of moving from the region 
around A to the region around B. The first factor is the Boltzmann probability for 
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being in the region around A at the start, the second factor (^) is the probability of 
selecting a forward direction for the trajectory, and the third factor is the probability 
that this trajectory will be accepted. The right side expresses the probability of 
moving from the region around B to the region around A in analogous fashion. The 
equality of these two probabilities is seen by substituting from equations (||) and 
(pd[). The detailed balance condition can be similarly verified for the case where a 
backward trajectory from A is chosen. 

The Boltzmann distribution is also invariant with respect to the stochastic tran- 
sitions, which simply replace p with a value chosen from its Boltzmann distribution. 
Thus, if (as we expect) the Markov chain is ergodic, it will have the Boltzmann dis- 
tribution as its unique stationary distribution. 

4 An acceptance procedure using windows 

The standard hybrid Monte Carlo algorithm can be generalized to consider a window 
of states at the end of the trajectory as candidate destinations for a dynamical 
transition, rather than just a single end state. In order to maintain detailed balance, 
a possible move to this window must be considered in relation to an equal-sized 
window around the current state, with the location of the current state within that 
window being determined randomly. Due to this later requirement, the trajectory 
may have to be computed for some number of steps in the reverse of its primary 
direction. Acceptance or rejection of a move between windows is based on the total 
probability of the states they contain. Whichever window is selected, a particular 
state within that window is then picked according to the Boltzmann probabilities. 

This procedure can significantly increase the probability of accepting a move, as 
will be discussed in Section 6. First, though, I will define the algorithm in more 
detail. 

To begin, values for the total number of steps in the trajectory, L, the base step 
size, eo, and the window size, W, are selected from some fixed distribution, with 
1 < W < L + 1. A forward or backward direction, A, for the trajectory is then 
chosen, with equal probabilities for A = +1 and A = —1, and an offset, K, for the 
current state within the start window is selected, with K £ {0, . . . ,W — 1}, each 
possible value being equally likely. 

A reverse portion of the trajectory is then computed by applying the leapfrog 
method with a step size of e = — Aeo, beginning with the start state, X(0). This is 
done for K iterations, producing states labeled X{— 1), . . . , X(— K). The start state 
is then restored, and the leapfrog method is applied with a step size of e = +Aeo for 
L — K iterations, producing states -^(1), • • • , X(L — K). 

The "reject" window at the front of the trajectory, around the current state, is 
defined as the set K = {X(-K), . . .,X(-K + W - 1)}. The "accept" window at 
the far end of the trajectory is defined as A = {X(L — K — W + 1), . . . , X(L — K)}. 
These windows may overlap. 
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The "free energy" for a window is defined as follows: 

F(W) = -log( J2 e M-H(X))) (13) 

The free energies are used to decide whether the next state will come from the 
accept window or the reject window. Using the acceptance function of equation (p]), 
the accept window is selected with probability a(AF) where AF = F(A) — F(TZ); 
otherwise we remain in the reject window (which contains the current state). Having 
decided on the window W, a particular state within that window is then selected 
according to the probabilities 

P{X) = exp(-H(X) + F(W)) (14) 

The state selected becomes the next state in the Markov chain. 

When implementing the generalized algorithm, it is not necessary to save all 
the states in the accept and reject windows. One need only save the start state, so 
it can be restored after the reversed portion of the trajectory has been calculated, 
along with a single state from the accept window and a single state from the reject 
window, one or the other of which will become the next state of the Markov chain. 

In detail, let F\ be the free energy for the first i states that have been visited 
in the accept window, and let C\ be a state chosen according to the Boltzmann 
probabilities from among these first i states. These variables can be calculated 
incrementally as new states in the accept window are visited. To start, F^ = oo 
and is undefined. When we visit the i-th state in the accept window, (q«,Pi), 
we can calculate 

F\ = -log(exp(-i/(q i ,p i ))+exp(-F_V 1 )) (15) 

C i = f C'a 1 with probability exp(-Fjf 1 + F\) 
A \ (Qi,Pi) with probability exp(-i?(q i , + 

Once all states have been visited, we will have F(A) = , and Cj[ will be a state 
picked from A according to the Boltzmann probabilities. Analogous variables, F^ 
and are maintained for the reject window. Once all states have been seen, a 
decision as to whether to use the accept window or the reject window can easily be 
made, and, in either case, a state selected from the chosen window is available. 

When W = 1, the generalized algorithm is equivalent to the standard hybrid 
Monte Carlo algorithm. When W = L + 1 the procedure reduces to simply picking a 
state from those anywhere along the trajectory in accordance with their Boltzmann 
probabilities. Some simplification in the implementation is then possible. 
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5 Validity of the generalized algorithm 



To demonstrate the validity of the generalized algorithm, we need to show that the 
detailed balance condition holds for the dynamical transitions. As L, W, and eo 
are chosen from a fixed distribution, independently of the current state, we may 
choose to regard them as fixed, since if detailed balance holds for transitions with 
any values of these parameters, it will hold for a mixture of such transitions. It will 
prove necessary to average over the values selected for A and K, however. 

Detailed balance will be proved separately for transitions to a state in the accept 
window and for those to a state in the reject window. If the two windows overlap, 
as they will if W > (L + l)/2, there will for some pairs of states be the possibility of 
transitions of either type. If detailed balance holds for the two types of transitions 
individually, however, it will also hold for the combination. 

To prove detailed balance for transitions within the reject window, note first that 
the set of possible trajectories (for the various values of A and K) that start at state 
A and that include state B in the reject window is the same as the set of trajectories 
that start at B and include A in the reject window. Here, a "trajectory" is defined as 
the set of states visited, along with the subsets of states that make up the accept and 
reject windows. In detail, if for the trajectory starting at A = Xa(0), with direction 
Xa and window offset K a, we have B = Xa{J) within the reject window, then 
the identical trajectory will be produced by starting at B = Xb(0), with direction 
A_b = Aa and window offset Kb = Ka + J, and A = Xb(—J) will be in the reject 
window. 

We can thus prove detailed balance separately for each such trajectory. The 
probability of being in a small region of volume 5V around A, of then picking values 
for A and K that generate a particular trajectory for which a state in the region of 
B is in the reject window, of then chosing to pick a state from the reject window, 
and of finally chosing the state in the region of B as the next state, is as follows: 

p(dA, Pa) SV ■ ~ • -L • (1 - a(F(A)-F(K))) ■ ex P (-tf (qs, p B ) + F(K)) (17) 

The probability of generating the same trajectory starting from the region of B, and 
of then ending up in the region of A, is 

p(qs, Pb) SV-y^-(l- a(F(A)-F(K))) ■ exp(-tf (<u, Pa) + F(U)) (18) 

These are readily seen to be equal upon substituting from equation 

Detailed balance for transitions to a state in the accept window follows similarly, 
using the fact that the set of possible trajectories that start at state A and that 
include state B in the accept window is the same as the set of trajectories that start 
at B and include A in the accept window, except that in the later trajectories the 
accept and reject windows are exchanged. In detail, if for the trajectory starting at 
A = Xa(0), with window offset Ka and direction Xa, we have B = Xa{J) within 
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the accept window, Aa, while A is in the reject window, TZa, then the trajectory 
produced by starting at B = Xb{0), with window offset Kb = L — Ka — J and 
Xb = — Aa, will lead to A = Xb(J) being in the accept window, As, while B is in 
the reject window, TZb- Furthermore, we will have Aa = TZb and TZa = Ab- 

The probability of being in a small region of volume 6V around A, of then picking 
values for A and K that generate a particular trajectory for which a state in the 
region of B is in the accept window, of then chosing to pick a state from the accept 
window, and of finally chosing the state in the region of B as the next state, is as 
follows: 

p(q A , p A ) SV ■ l - • • a{F(A A )-F(TZ A )) • exp(-tf(q B , p B ) + F{A A )) (19) 

For the probability of generating the same trajectory, but with accept and reject 
windows exchanged, starting from the region of B, and of then picking a state in 
the region of A, we have 

p(q B , PB ) SV-^~- a{F{A B )-F{TZ B )) ■ exp(-tf (q A , p A ) + F(A B )) (20) 

Again, these are seen to be equal upon substituting from equations @ and (p"l|), 
remembering that A a = TZb and TZa = Ab- 

6 Acceptance probability for the generalized algorithm 

Two situations where the use of windows will increase the acceptance probability of 
the hybrid Monte Carlo algorithm are illustrated in Figure 1. 

In the trajectory of Figure 1(a), the energies of the states in the reject window, 
at the start of the trajectory, are all approximately equal to the energy of the current 
state, H$. Most of the states in the accept window, at the end of the trajectory, 
have energies in the vicinity of H + , much greater than that of the current state. 
However, one state has a much lower energy, H_. 

If the standard hybrid Monte Carlo algorithm is applied in this situation, with 
the end-point of the trajectory randomly picked from the last W states shown, the 
probability of acceptance will be approximately 1/W, since moves to states with 
energy H + would almost certainly be rejected, while a move to the one state with 
energy fl_ would be accepted. 

In contrast, if the generalized algorithm is applied with a window of size W, the 
move will always be accepted. The free energy of the reject window will be 

F(TZ) « -log(VFexp(-F )) « H Q -\og(W) (21) 

Assuming H + ^> the free energy of the accept window will be 

F{A) w -log((W-l)exp(-fl+)+exp(-fl_)) « fl_ (22) 
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Thus, if Hq — H- > log(PF), the move will be accepted, and the particular state 
chosen within the accept window will almost certainly be the one with energy H-. 
Note that it is essential to this example that a state in the accept window have 
lower energy than those in the reject window. If this is not the case, the acceptance 
probability is the same as for the standard algorithm. 

The trajectory of Figure 1(b) illustrates a different, perhaps more typical, sit- 
uation where the use of windows also increases the acceptance probability. Here, 
both windows contain approximately equal numbers of low and high energy states. 
Note that the current state is likely to be one of the low energy ones, since it is 
Boltzmann distributed. 

With the standard algorithm, the end state might equally well be one of high 
energy or one of low energy. In the former case, the move would likely be rejected, 
while in the later, it would have a good probability of being accepted. The total 
acceptance probability will thus be around 1/2. 

If the generalized algorithm is used with a window size large compared to the 
time scale of energy fluctuations, however, the free energy of both the accept and 
reject windows will be approximately equal, and the acceptance probability will be 
near one. The particular state chosen within the accept window will likely be one 
with low energy. 

These examples show that the use of windows can be beneficial, but they do 
not indicate the magnitude of the benefit, nor whether there is any improvement 
in the asymptotic form of the time requirements as system size increases. Indeed, 
for fixed values of L, eq, and W, the acceptance probability declines exponentially 
with increasing system size, as for the standard algorithm. This scaling behaviour is 
due to the free energies of the windows, and hence their difference, being extensive 
quantities that increase in proportion to system size. 

7 Performance for a system of uncoupled oscillators 

To gain some insight into the degree of benefit from using the generalized algorithm, 
and into its scaling behaviour, I have tested it empirically on systems of uncoupled 
oscillators. These simple systems are meant to model more complex systems in which 
the components are not completely independent, but interact only weakly. The 
behaviour of the standard hybrid Monte Carlo algorithm for systems of uncoupled 
oscillators has been analysed in detail by Kennedy and Pendleton Q. 
The potential energy function for such a system is 



In the Boltzmann distribution with respect to E, each qi is independent, and dis- 
tributed as a Gaussian with zero mean and standard deviation 1/cjj. Since the 




(23) 
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operation of the hybrid Monte Carlo algorithm is invariant with respect to transla- 
tions and rotations of the coordinates, these systems in fact model the behaviour of 
the algorithm as applied to any multivariate Gaussian distribution. 

Note that for the leapfrog method to be stable, with the error in H remaining 
bounded even for long trajectories, it is necessary for the step size to be less than 
2/^max, where o; max is the largest of the Wj. For efficient exploration of the other 
coordinates, the average length of a trajectory in fictitious time, T t , should be in 
the vicinity of l/u; m i n . 

In using these systems as a test bed, we must be clear concerning which aspects 
of the system we expect to carry over to realistic problems, and which will not. I 
assume here that the approximate magnitudes of uj m \ n and Lo max and the general 
distribution of the Ui are known, and that these may be used to select appropriate 
values for the step size and trajectory length. I assume that their exact values would 
not be known in a realistic system — indeed, they will not be precisely defined, since 
the components of the real system will not be completely uncoupled. To model this, 
the step size (and hence the trajectory length as well) was varied slightly at random, 
in order to avoid results that depend on precise tuning of these parameters. 

I will also assume that we are interested in systems for which the ratio uJ max /uJ m m 
is large. This ratio is a measure of the inherent difficulty of the problem, being 
(roughly) the number of leapfrog steps required to generate an independent config- 
uration. In the experiments, the length in fictitious time of a trajectory was chosen 
to be large enough for the acceptance probability to have reached equilibrium. In 
a real system, however, the appropriate trajectory length (T t l/u; m i n ) might well 
be much longer than this. 

Since chosing any particular value for T t would be arbitrary, I have for the most 
part evaluated the algorithms on the assumption that T t is very large. The "cost" 
of a particular combination of window size and average step size was accordingly 
taken to be 

C = l/(e(l-p)) (24) 

where p is the probability of a move being rejected, and e is the average step size 
(recall that the actual step sizes used will be slight perturbations around this aver- 
age). This cost measure is proportional to the number of energy gradient evaluations 
needed to generate a given number of accepted moves of average length Tt, provided 
T t is much greater than T w , the length in fictitious time of the windows. Note that 
this cost measure places no value on transitions within the reject window, though 
these presumably improve sampling at least somewhat. 

If T w is comparable to Tt, then the above measure would not be appropriate, 
since it ignores the effort expended in computing those portions of the trajectory 
that lie before the current state and after the new state. An appropriate cost in this 
case would be (1 + T w /T t ) / (e(l-p)). 

I assume that for a given window length, T w , and a given system size, N, the 
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value of e that minimizes C can feasibly be found, and that this minimal value of C 
is thus the appropriate measure of the cost of the algorithm for a particular window 
length. If the rejection probability has the functional form 

p = F(g(N)e k ) (25) 

then one can show by straightforward means that this minimum is reached at a 
value of p that is independent of N. The scaling properties of the algorithm can 
then be found by determining how e must decrease as N increases in order to keep 
the rejection probability constant. Note that we are measuring cost as the number 
of evaluations of the energy gradient, not the time required for these evaluations. 
Computation time per evaluation will vary with system size in a manner which 
depends on the application, but which will generally add an additional factor of at 
least N to the total computation time as a function of system size. 

I have tested the standard and the generalized algorithms on systems for which 
N = 100, 200, 400, 800, 1600, and 3200. In each case, the were selected randomly 
from the range 500 to 1000, with a uniform distribution for log (a;). Though I assume 
that components with much smaller u would also be present in a real system, such 
were not actually included in the simulation, in order to save time. Their inclusion 
would have had a negligible effect on the acceptance probability, since the accuracy 
of the leapfrog method is very high when 1/co is much greater than the step size. 

An average trajectory length of T t = 1 « 1000 x (l/o» max ) was used. Several of 
the simulations were done with Tj = 2 as well, and the results confirmed that the 
acceptance probability had indeed reached equilibrium at Tt = 1. 

For a given average step size, e, and for a given window size, W, the number of 
steps in the entire trajectory, L, was set so that Tt = e(L— W+l). (This calculation 
accounts for the average number of steps before and after the current and new 
states.) The trajectory was then computed with these values of L and W and with 
a value of eo randomly selected from the region within 1% of the given e. 

Runs were done of the standard algorithm, for which W = 1, and of the gen- 
eralized algorithm with a window length of T w = 0.20, for which the number of 
states in the window was W = T w /e. Some runs with T w = 0.05 and T w = 0.10 
were done as well, with results that were similar to but not quite as good as those 
for T w = 0.20. Values for e of ... , 0.000707, 0.000841, 0.001000, 0.001189, . . ., in 
geometric steps of 2 1 / 4 were used. In each run, 1000 trajectories were generated 
starting from position and momentum coordinates picked from the Boltzmann dis- 
tribution, independently for each trajectory. The proportion of rejected moves, p, 
was taken to be an estimate of the rejection probability, p. The standard error for 
this estimate is ±0.016 at p = 0.5, declining to ±0.009 at p = 0.1 or p = 0.9. An 



estimate, C, for the cost follows from equation (24). 

The results are shown in Figure 2, which plots the estimated rejection probability 
and consequent cost for each value of N and for various values of e, for both the 
standard algorithm and the generalized algorithm with T w = 0.20. When the best 
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value of e is used in each case, the cost of the generalized algorithm is roughly 
half that of the standard algorithm, with some indication that the advantage of the 
generalized algorithm may be greater for the larger N. Interestingly, the rejection 
probability with the optimal e is significantly lower for the generalized algorithm 
than for standard algorithm. 

In analysing this data further, we can first compare with the analytic results of 
||. They derive the following expression for the rejection probability of the standard 
algorithm applied to a system of uncoupled oscillators (adapted from their equation 
(2.10)): 

p » erf (JN^a/S2j (26) 

where a = N^ 1 J2i sin 2 (oj{Tt) / 4. Assuming that the small random variation in e, 
and hence Tf , is enough to randomize the phases in this sum, and using the fact that 
2^: Jq 77 sin 2 (x)dx = 5, we get that, for the experiment described here, the rejection 
probability should be 

p « erf (jNe A a /256\ (27) 

where a = iV -1 Yui^t ~ 3.38 x 10 11 . The rejection probability thus has the form 
of equation (|25|). As system size increases, e should be scaled as iV -1 / 4 in order to 
keep the rejection probability constant, and the cost will grow as iV 1 / 4 . 

In Figure 3, p is plotted against Ne 4 for all runs of the standard algorithm. As 
expected, p appears to be a function of Ne 4 , as the spread in the data points in 
comparable to the standard error. (Note that here, and in Figure 4, the standard 
error is apparently larger for smaller values of p, due to the logarithmic scale.) 
Comparison with the exact predictions of equation (^?j) (not shown in the figure) 
shows a good fit, except for a slight departure for large values of the rejection rate. 

We can hypothesize that the rejection probability of the generalized algorithm 
for a given window length will also be a function of iVe 4 . If this is the case, the cost 
for the generalized algorithm will also scale as TV 1 / 4 , though there could, as seen 
above, be an improvement in the constant factor over the standard algorithm. To 
test this hypothesis, Figure 3 also shows p plotted against A^e 4 for all runs of the 
generalized algorithm with T w = 0.20. A tendency of the curves for the larger N to 
lie below those for the smaller N is apparent, leading one to reject this hypothesis. 

The data for the generalized algorithm is better fit by the alternative hypothesis 
that the rejection probability is a function of N 7 ^ 8 e 4 . This is seen in Figure 4. 
For comparison, the data for the standard algorithm is also plotted under this 
assumption, and it is clear that in this case the fit is bad. 

If this hypothesis is true, then the step size for the generalized algorithm should 
be scaled as A^ 7 / 32 to maintain a constant rejection rate, and the cost will con- 
sequently grow with system size as N 7 ^ 32 , an improvement over the N 1 / 4 scaling 
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of the standard algorithm. This result must be regarded as tentative, however. It 
seems possible that for very large N the window length, T w , might have to increase 
at some rate in order to maintain good scaling behaviour. This would ultimately 
affect the cost, once T w became comparable to Ti. Behaviour could also conceivably 
depend on the exact distribution of the Wj. A theoretical analysis is thus needed to 
gain a better understanding of the performance of the generalized algorithm. It is 
clear, however, that at the very least, it can improve performance by a significant 
constant factor. 

8 Variations on the algorithm 

Two variations on the generalized algorithm described here are worth mentioning. 

First, when a move to the accept window is rejected, it is valid to simply re- 
main at the current state, rather than selecting a new state from the reject window 
according to the Boltzmann probabilities. This follows from the fact that detailed 
balance was shown above to hold independently for the accept and reject transitions. 
Detailed balance thus continues to hold if all reject transitions are eliminated. 

With this variation, the overhead of saving states is somewhat reduced. Typi- 
cally, however, this overhead is small compared to the cost of evaluating the gradient 
of the energy, and its elimination may not be worth giving up the additional explo- 
ration provided by the reject transitions. Note that it is in any case necessary to 
visit all the states in the reject window, in order to compute its free energy. 

A second variation may be useful when the ideal step size is not known a priori. 
In such cases, we may have to use step sizes selected at random from a fairly broad 
distribution. Trajectories computed with a step size that is too large will result in 
large changes in H and are unlikely to be accepted. 

To save computation, we can terminate such trajectories early, stopping when- 
ever a single leapfrog step changes H by an amount, positive or negative, whose 
magnitude is greater than some threshold. The state reached after this large change 
in H is not included. Such truncated trajectories will have smaller than normal 
accept and/or reject windows, but examination of the proof of validity in Section 5 
shows that they may validly be treated the same way as normal trajectories. The 
accept window for a truncated trajectory may, in fact, be null, in which case the 
move is rejected. (The reject window will always contain at least the current state.) 

This variation is applicable to the standard hybrid Monte Carlo algorithm, where 
the window size is one. In this case, all truncated trajectories are rejected, with the 
current state remaining unchanged. 
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Figure 1: Two situations where the use of windows increases the acceptance prob- 
ability. The graphs show the change in total energy along two hypothetical trajec- 
tories. 



16 



